Use web2json agent to clean html for v2 project#88
Open
dreamGirl1996 wants to merge 2 commits intoccprocessor:mainfrom
Open
Use web2json agent to clean html for v2 project#88dreamGirl1996 wants to merge 2 commits intoccprocessor:mainfrom
dreamGirl1996 wants to merge 2 commits intoccprocessor:mainfrom
Conversation
1041206149
reviewed
Apr 14, 2026
Collaborator
There was a problem hiding this comment.
notebooks/ 下的这两个是用于spark上执行的脚本吗
Author
There was a problem hiding this comment.
这两个文件不是给 Spark 用的,是为了在 Jupyter/Notebook 里本
地调试和演示 web2json 流程加的辅助脚本。主要做 notebook 环境初始化、组装配置,
以及从 notebook 里直接调用整条 pipeline。
1041206149
reviewed
Apr 14, 2026
Collaborator
There was a problem hiding this comment.
你的输入的是jsonl,需要用这个脚本提取html字段内容作为web2json的输入吧?
1041206149
reviewed
Apr 14, 2026
Collaborator
There was a problem hiding this comment.
新增的代码精简逻辑解决了什么问题,可以comment或者飞书文档内说明下
1041206149
reviewed
Apr 14, 2026
1041206149
reviewed
Apr 14, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.